release: SKaiNET-transformers 0.30.0 by michalharakal · Pull Request #177 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-06-14T18:19:39Z

Prepares the 0.30.0 release, version-aligned with the released SKaiNET 0.30.0 (Q5_K packed matmul, NEON native kernels, Kotlin/Native cinterop). Skips 0.29.x — tracked internally without a tagged release.

Headline

Q5_K stays packed in the eager Gemma runtime. GemmaMemSegConverter used to dequantize Q5_K weights to FP32 on load; the engine now provides a first-class Q5_K packed matmul (Q5_KBlockTensorData + Q5KMatmulKernel), so weights stay packed (176 B/block). FunctionGemma-270M (Q5_K_M) decodes byte-identically to the FP32 baseline (GemmaQ5KPackedParityTest).
Gemma NATIVE_OPTIMIZED path is Kotlin/Native–ready. The layout + packing helpers (GemmaQuantLayout.kt, GemmaPackedWeights.kt) moved to commonMain, and GemmaNetworkLoader.load() now runs convertGemmaWeightsPacked — the board binary keeps K-quant weights packed with no java.lang.foreign MemSeg dependency. Verified on JVM and linuxX64.
Fixes. Kernel-less quant types under NATIVE_OPTIMIZED now dequant to FP32 [out, in] instead of crashing on a rank-1 transpose; DecoderGgufMemSegConverter dequantizes Q4_1 and every other non-packed quant type (#654).

Release prep in this PR

gradle.properties: VERSION_NAME 0.28.1 → 0.30.0 (catalog skainet already pinned to 0.30.0).
settings.gradle.kts: reverted the mavenLocal()-first dev shim — 0.30.0 is on Maven Central; the -PuseLocalSkainet composite build is unchanged.
CHANGELOG.md: [0.30.0] entry + tag link.
README.md + doc tutorials: "Current release" / BOM coordinates → 0.30.0; new "What's new in 0.30.0".
API dumps refreshed (./gradlew apiDump). jvmApiCheck had flagged stale dumps; all deltas reflect public API already in the source — the 0.23.3 prefill callback (llm-agent), convertGemmaWeightsPacked (gemma), and the KClass dtype param on the vendored transformer modules (llm-core).

Validation

./gradlew build — BUILD SUCCESSFUL in 3m 3s, no failed tasks (compilation, tests, all apiCheck variants).

Integration-tagged tests (-PincludeIntegration, e.g. GemmaQ5KPackedParityTest) are not part of the default build and were not run in this pass.

🤖 Generated with Claude Code

FunctionGemma-270M ships as Q5_K_M, but GemmaMemSegConverter dequantized Q5_K weights to FP32 on load ("no native matmul kernel yet for Q5_K"), losing the memory savings and the in-kernel dequant. Upstream SKaiNET 0.29.1 now provides a first-class Q5_K packed matmul (Q5_KBlockTensorData + Q5KMatmulKernel: scalar/Panama/native), so keep Q5_K packed here too: relayout GGUF bytes to block-major + wrap as Q5_KBlockTensorData (176 B/ block). Dispatch + lazy transpose reach it via DefaultCpuOps. - Bump skainet 0.28.1 -> 0.29.1 (source-of-truth for the llm-bom platform). - settings.gradle.kts: mavenLocal first so a locally-published SKaiNET 0.29.1 (carrying the in-progress Q5_K kernel) shadows Maven Central until it's released; Central remains the fallback. Verified (GemmaQ5KPackedParityTest, -PincludeIntegration): the Q5_K packed path decodes FunctionGemma byte-identically to the FP32 baseline — [262146, 236769, 3255, 718, 498, 1373, 262152, 106] -> `<tool_0>(state="on") <end>` for "Turn the light on." (the known-good tool call), 0.81 tok/s on the JVM host incl. prefill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ard path The board binary is Kotlin/Native, but GemmaMemSegConverter (the NATIVE_OPTIMIZED packed-weight path) is jvmMain-only (java.lang.foreign). Move the reusable, platform-neutral pieces to commonMain so K/N can keep K-quant weights packed: - GemmaQuantLayout.kt (commonMain): logicalShapeFor + relayoutKSeriesRowMajor ToBlockMajor (now copyInto, KMP-safe) + packGemmaKQuant<T>() which builds heap-packed Q4_K/Q5_K/Q6_KBlockTensorData directly (no MemSeg/Arena). - GemmaMemSegConverter (jvmMain) now shares those commonMain helpers (dup removed); MemSeg/FFM conversion + FP32 fallbacks stay JVM-only. - commonTest GemmaQuantLayoutTest: block-transpose relayout + packing, runs on every target. Verified: gemma compiles for JVM + linuxX64; layout tests green (3). Next (board integration): a commonMain convertGemmaWeightsPacked wired into the K/N load path (byte extraction differs JVM IntArrayTensorData vs native Byte- backed), then a full K/N decode on the SL2610. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oad() NATIVE_OPTIMIZED loads produce raw-byte quant tensors the network mapper can't consume; on JVM an external convertGemmaWeightsToMemSeg (FFM) handled that, but the Kotlin/Native board has no such path. Add a commonMain converter and make load() apply it, so load(NATIVE_OPTIMIZED) yields a runnable network on the board AND the JVM (previously it couldn't be built from raw-byte weights at all). - GemmaPackedWeights.kt (commonMain): convertGemmaWeightsPacked — packs Q4/5/6_K matmul weights to heap Q*_KBlockTensorData (packGemmaKQuant), dequants token_embd/output to FP32 (gathered, no transpose) and other quant types to FP32 [out,in]. No java.lang.foreign. Plus extractRawBytes, which reads the loader's bytes back across both backings (JVM IntArrayTensorData / native Byte-typed). - GemmaNetworkLoader.load(): for NATIVE_OPTIMIZED, run convertGemmaWeightsPacked before applyWeightsToNetwork. Verified on JVM AND linuxX64 (GemmaQuantLayoutTest, 4 tests each): relayout, packing, and the byte-extraction round-trip — so native byte extraction is executed, not just compiled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extends GemmaQ5KPackedParityTest to also decode via GemmaNetworkLoader.load(NATIVE_OPTIMIZED) — the wired commonMain convertGemmaWeightsPacked (board) path, no MemSeg/Arena. All three paths (FP32 baseline, jvmMain MemSeg-packed, load() packed) produce the identical token sequence -> `<tool_0>(state="on")<end>` for "Turn the light on." Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Six real-model integration tests (RealGemmaLoad/Eager/BakeIrpa/ExternalParam/ DequantDump + GemmaBehavioralAb) pointed at an old workspace path (/home/miso/projects/coral/sl2610-voice-cc-kt/models/...) and failed with "File not found" under -PincludeIntegration. Repoint them to the actual model location (SKaiNET-embedded/sl2610-function-calling/models/), matching GemmaQ5KPackedParityTest. Verified: all 6 pass against skainet 0.30.0 (mavenLocal), -PincludeIntegration.

Version-aligned with the released SKaiNET 0.30.0 (Q5_K packed matmul, NEON native kernels, Kotlin/Native cinterop), already pinned in the catalog. - gradle.properties: VERSION_NAME 0.28.1 -> 0.30.0. - settings.gradle.kts: revert the mavenLocal()-first dev shim (0.30.0 is on Maven Central; the -PuseLocalSkainet composite build stays for local work). - CHANGELOG.md: add the [0.30.0] entry (Q5_K packed eager runtime, K/N-ready NATIVE_OPTIMIZED Gemma path, kernel-less/Q4_1 dequant fixes) + tag link. - README.md: bump "Current release" + BOM snippet to 0.30.0; add "What's new in 0.30.0". - docs tutorials: bump BOM coordinates 0.28.1 -> 0.30.0. No merge, no tag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`./gradlew build` runs `jvmApiCheck`, which flagged the committed `.api` dumps as stale. Regenerated via `./gradlew apiDump`; all changes reflect public API already present in the source on this branch: - llm-agent: the 0.23.3 prefill-progress callback — `generateUntilStop` gained its `onPrefill` `Function2` param and `AgentListener` gained `onPrefillProgress(Int, Int)`; the dump was never refreshed. - llm-inference/gemma: `convertGemmaWeightsPacked` — the commonMain packed-weight converter added for the Kotlin/Native NATIVE_OPTIMIZED path. - llm-core: trailing `KClass` dtype param on the vendored transformer modules (AttentionImpl / RMSNormalization / GeGLUFFN / MultiHeadAttention / LayerScalarMul / VoidDense) from earlier engine-aligned work. `./gradlew build` now green end-to-end (3m 3s, no failed tasks). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…st blocks The real-model FunctionGemma-270M integration tests (-PincludeIntegration) OOM'd with `Java heap space` at the previous 8g default once the model file is present: GemmaQ5KPackedParityTest holds the FP32 baseline plus both packed decode networks at once, and the bake-to-irpa test holds weights + serialized bytes simultaneously. - Bump the `gemmaTestMaxHeap` default 8g -> 12g. - Merge the two overlapping `tasks.withType<Test>().configureEach { }` blocks into one — the second silently overrode the first's maxHeapSize (so jvmArgs ran with 6g declared but 8g effective). Now jvmArgs, heap, and the seqLen system property live in a single block. CI is unaffected: without the model file the integration tests self-skip and never allocate the headroom. Verified: `:llm-inference:gemma:jvmTest -PincludeIntegration` green with no -P override (87 tests, 6 skipped, 0 failures); GemmaQ5KPackedParityTest runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal and others added 9 commits June 10, 2026 23:41

build: consume skainet 0.30.0 (released Q5_K + NEON + K/N cinterop)

a222b2a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal merged commit eb505fe into develop Jun 14, 2026
6 checks passed

michalharakal deleted the release/0.30.0 branch June 14, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: SKaiNET-transformers 0.30.0#177

release: SKaiNET-transformers 0.30.0#177
michalharakal merged 9 commits into
developfrom
release/0.30.0

michalharakal commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Jun 14, 2026

Headline

Release prep in this PR

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant